動機:
考試成績一直都是衡量學習績效的指標之一,甚至是升學主要管道。
「教育平等」一直是社會發展目標,然而除了學生本身的努力與天份外,許多外在因素也會影響學生成績表現。
究竟學生的成績還會受到哪些外在因素影響?
資料表介紹
使用資料集:Student Alcohol Consumption (2008)
來源:Kaggle
作者:UCI MACHINE LEARNING
介紹:對葡萄牙的兩所中學參與數學和葡萄牙語課程的學生進行調查,
此次分析取用數學成績資料集,
內容包含性別、年齡、家庭背景、居住環境、生活習慣、飲酒頻率和健康狀況等。
這個資料集有33個欄位、395位學生的樣本量。
資料集來源網址:https://www.kaggle.com/datasets/uciml/student-alcohol-consumption
欄位介紹
| 欄位名稱 | 型態 | 簡介 |
|---|---|---|
| school | binary | 學校(G/M) |
| sex | binary | 學生的性別(F - 女性或M - 男性) |
| age | numeric | 學生年齡 |
| address | binary | 學生的家庭住址類型(U - 城市或R - 農村) |
| famsize | binary | 家庭規模(LE3 - 小於或等於3 或 GT3 - 大於3) |
| Pstatus | binary | 父母的同居狀態(T - 共同生活或A - 分開) |
| Medu | numeric | 母親的教育程度(0. 無、1. 小學教育(4年級)、2. 5~9年級、3. 中等教育、4. 高等教育) |
| Fedu | numeric | 父親的教育程度(0. 無、1. 小學教育(4年級)、2. 5~9年級、3. 中等教育、4. 高等教育) |
| Mjob | character | 母親的工作(老師、 健康護理相關、民事服務(例如行政或警察)、 at_home或其他) |
| Fjob | character | 父親的工作(老師、 健康護理相關、民事服務(例如行政或警察)、 at_home或其他) |
| reason | character | 選擇這所學校的理由(家、學校聲譽、課程偏好或其他) |
| guardian | character | 學生的監護人(母親、父親或其他) |
| traveltime | numeric | 學校&家裡通勤時間(1. 15分鐘、2. 15~30分鐘、3. 30分鐘~1小時、4. 1小時) |
| studytime | numeric | 每週自主學習時間(1. 2小時、2. 2~5小時、3. 5~10小時,4. 10小時) |
| failures | numeric | 過去課程失敗的數量(沒通過該課程) |
| schoolsup | binary | 額外的教育支持(是或否) |
| famsup | binary | 家庭教育的支持(是或否) |
| paid | binary | 課程科目中的額外付費課程(是或否) |
| activities | binary | 課外活動(是或否) |
| nursery | binary | 上幼兒園(是或否) |
| higher | binary | 想接受高等教育(是或否) |
| internet | binary | 家庭上網(是或否) |
| romantic | binary | 有沒有戀愛的關係(是或否) |
| famrel | numeric | 家庭關係品質(從1.非常差~5.優秀) |
| freetime | numeric | 放學後的空閒時間(從1.非常低~5.非常高) |
| goout | numeric | 和朋友一起出去玩的頻率(從1.非常低~5.非常高) |
| Dalc | numeric | 工作日飲酒量(從1.非常低~5.非常高) |
| Walc | numeric | 周末飲酒量(從1.非常低~5.非常高) |
| health | numeric | 健康程度(從1.非常差~5.非常好) |
| absences | numeric | 缺課次數(0~93) |
| G1 | numeric | 第一階段成績(0~20) |
| G2 | numeric | 第二階段成績(0~20) |
| G3 | numeric | 最終成績(0~20) |
school sex age address
Length:395 Length:395 Min. :15.0 Length:395
Class :character Class :character 1st Qu.:16.0 Class :character
Mode :character Mode :character Median :17.0 Mode :character
Mean :16.7
3rd Qu.:18.0
Max. :22.0
famsize Pstatus Medu Fedu
Length:395 Length:395 Min. :0.000 Min. :0.000
Class :character Class :character 1st Qu.:2.000 1st Qu.:2.000
Mode :character Mode :character Median :3.000 Median :2.000
Mean :2.749 Mean :2.522
3rd Qu.:4.000 3rd Qu.:3.000
Max. :4.000 Max. :4.000
Mjob Fjob reason guardian
Length:395 Length:395 Length:395 Length:395
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
traveltime studytime failures schoolsup
Min. :1.000 Min. :1.000 Min. :0.0000 Length:395
1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.0000 Class :character
Median :1.000 Median :2.000 Median :0.0000 Mode :character
Mean :1.448 Mean :2.035 Mean :0.3342
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:0.0000
Max. :4.000 Max. :4.000 Max. :3.0000
famsup paid activities nursery
Length:395 Length:395 Length:395 Length:395
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
higher internet romantic famrel
Length:395 Length:395 Length:395 Min. :1.000
Class :character Class :character Class :character 1st Qu.:4.000
Mode :character Mode :character Mode :character Median :4.000
Mean :3.944
3rd Qu.:5.000
Max. :5.000
freetime goout Dalc Walc
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
Median :3.000 Median :3.000 Median :1.000 Median :2.000
Mean :3.235 Mean :3.109 Mean :1.481 Mean :2.291
3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:3.000
Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
health absences G1 G2
Min. :1.000 Min. : 0.000 Min. : 3.00 Min. : 0.00
1st Qu.:3.000 1st Qu.: 0.000 1st Qu.: 8.00 1st Qu.: 9.00
Median :4.000 Median : 4.000 Median :11.00 Median :11.00
Mean :3.554 Mean : 5.709 Mean :10.91 Mean :10.71
3rd Qu.:5.000 3rd Qu.: 8.000 3rd Qu.:13.00 3rd Qu.:13.00
Max. :5.000 Max. :75.000 Max. :19.00 Max. :19.00
G3 gmeans
Min. : 0.00 Min. : 1.333
1st Qu.: 8.00 1st Qu.: 8.333
Median :11.00 Median :10.667
Mean :10.42 Mean :10.679
3rd Qu.:14.00 3rd Qu.:13.333
Max. :20.00 Max. :19.333
->資料大多為 類別型 & 1~5 level的數值型變數'data.frame': 395 obs. of 34 variables:
$ school : chr "GP" "GP" "GP" "GP" ...
$ sex : chr "F" "F" "F" "F" ...
$ age : int 18 17 15 15 16 16 16 17 15 15 ...
$ address : chr "U" "U" "U" "U" ...
$ famsize : chr "GT3" "GT3" "LE3" "GT3" ...
$ Pstatus : chr "A" "T" "T" "T" ...
$ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
$ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
$ Mjob : chr "at_home" "at_home" "at_home" "health" ...
$ Fjob : chr "teacher" "other" "other" "services" ...
$ reason : chr "course" "course" "other" "home" ...
$ guardian : chr "mother" "father" "mother" "mother" ...
$ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
$ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
$ failures : int 0 0 3 0 0 0 0 0 0 0 ...
$ schoolsup : chr "yes" "no" "yes" "no" ...
$ famsup : chr "no" "yes" "no" "yes" ...
$ paid : chr "no" "no" "yes" "yes" ...
$ activities: chr "no" "no" "no" "yes" ...
$ nursery : chr "yes" "no" "yes" "yes" ...
$ higher : chr "yes" "yes" "yes" "yes" ...
$ internet : chr "no" "yes" "yes" "yes" ...
$ romantic : chr "no" "no" "no" "yes" ...
$ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
$ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
$ goout : int 4 3 2 2 2 2 4 4 2 1 ...
$ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
$ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
$ health : int 3 3 3 5 5 5 3 1 1 5 ...
$ absences : int 6 4 10 2 4 10 0 6 0 0 ...
$ G1 : int 5 5 7 15 6 15 12 6 16 14 ...
$ G2 : int 6 5 8 14 10 15 12 5 18 15 ...
$ G3 : int 6 6 10 15 10 15 11 6 19 15 ...
$ gmeans : num 5.67 5.33 8.33 14.67 8.67 ...
[1] "空值數量: 0"
[1] 0
->檢查資料是否有空值和缺失值並印出,此資料集無空值&缺失值
[,1]
age -0.134589374
Medu 0.224259868
Fedu 0.175852135
traveltime -0.128197163
studytime 0.134564719
failures -0.375758896
famrel 0.021652521
freetime 0.003773140
goout -0.154511336
Dalc -0.072508178
Walc -0.088024671
health -0.080380376
absences -0.005908806
->可以看到與成績表現之相關性如上 age Medu Fedu traveltime studytime
age 1.000000000 -0.163658419 -0.163438069 0.070640721 -0.004140037
Medu -0.163658419 1.000000000 0.623455112 -0.171639305 0.064944137
Fedu -0.163438069 0.623455112 1.000000000 -0.158194054 -0.009174639
traveltime 0.070640721 -0.171639305 -0.158194054 1.000000000 -0.100909119
studytime -0.004140037 0.064944137 -0.009174639 -0.100909119 1.000000000
failures 0.243665377 -0.236679963 -0.250408444 0.092238746 -0.173563031
famrel 0.053940096 -0.003914458 -0.001369727 -0.016807986 0.039730704
freetime 0.016434389 0.030890867 -0.012845528 -0.017024944 -0.143198407
goout 0.126963880 0.064094438 0.043104668 0.028539674 -0.063903675
Dalc 0.131124605 0.019834099 0.002386429 0.138325309 -0.196019263
Walc 0.117276052 -0.047123460 -0.012631018 0.134115752 -0.253784731
health -0.062187369 -0.046877829 0.014741537 0.007500606 -0.075615863
absences 0.175230079 0.100284818 0.024472887 -0.012943775 -0.062700175
failures famrel freetime goout Dalc
age 0.24366538 0.053940096 0.01643439 0.126963880 0.131124605
Medu -0.23667996 -0.003914458 0.03089087 0.064094438 0.019834099
Fedu -0.25040844 -0.001369727 -0.01284553 0.043104668 0.002386429
traveltime 0.09223875 -0.016807986 -0.01702494 0.028539674 0.138325309
studytime -0.17356303 0.039730704 -0.14319841 -0.063903675 -0.196019263
failures 1.00000000 -0.044336626 0.09198747 0.124560922 0.136046931
famrel -0.04433663 1.000000000 0.15070144 0.064568411 -0.077594357
freetime 0.09198747 0.150701444 1.00000000 0.285018715 0.209000848
goout 0.12456092 0.064568411 0.28501871 1.000000000 0.266993848
Dalc 0.13604693 -0.077594357 0.20900085 0.266993848 1.000000000
Walc 0.14196203 -0.113397308 0.14782181 0.420385745 0.647544230
health 0.06582728 0.094055728 0.07573336 -0.009577254 0.077179582
absences 0.06372583 -0.044354095 -0.05807792 0.044302220 0.111908026
Walc health absences
age 0.11727605 -0.062187369 0.17523008
Medu -0.04712346 -0.046877829 0.10028482
Fedu -0.01263102 0.014741537 0.02447289
traveltime 0.13411575 0.007500606 -0.01294378
studytime -0.25378473 -0.075615863 -0.06270018
failures 0.14196203 0.065827282 0.06372583
famrel -0.11339731 0.094055728 -0.04435409
freetime 0.14782181 0.075733357 -0.05807792
goout 0.42038575 -0.009577254 0.04430222
Dalc 0.64754423 0.077179582 0.11190803
Walc 1.00000000 0.092476317 0.13629110
health 0.09247632 1.000000000 -0.02993671
absences 0.13629110 -0.029936711 1.00000000
1.工作日飲酒量&周末飲酒量相關性較高
2.母親教育程度&父親教育程度相關性也偏高
Df Sum Sq Mean Sq F value Pr(>F)
Walc 1 42 41.72 3.069 0.0806 .
Residuals 393 5343 13.59
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
Dalc 1 28 28.31 2.077 0.15
Residuals 393 5356 13.63
-工作日飲酒量&周末飲酒量對於學生成績表現較無差異 Df Sum Sq Mean Sq F value Pr(>F)
Fedu 1 167 166.51 12.54 0.000446 ***
Residuals 393 5218 13.28
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
Medu 1 271 270.80 20.81 6.78e-06 ***
Residuals 393 5114 13.01
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-母親教育程度&父親教育程度對於學生成績表現有顯著差異
Call:
lm(formula = gmeans ~ age + Medu + Fedu + traveltime + studytime +
failures + famrel + freetime + goout + health + absences,
data = student_data)
Residuals:
Min 1Q Median 3Q Max
-9.0569 -2.1228 0.1757 2.2471 8.6242
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.713276 2.661103 4.402 1.40e-05 ***
age -0.066310 0.142430 -0.466 0.64179
Medu 0.423930 0.204181 2.076 0.03854 *
Fedu 0.053162 0.203646 0.261 0.79419
traveltime -0.325462 0.249287 -1.306 0.19248
studytime 0.291015 0.210385 1.383 0.16739
failures -1.521991 0.249374 -6.103 2.55e-09 ***
famrel 0.038369 0.193661 0.198 0.84305
freetime 0.301203 0.182396 1.651 0.09948 .
goout -0.469382 0.162040 -2.897 0.00399 **
health -0.155294 0.124187 -1.250 0.21189
absences 0.008149 0.021924 0.372 0.71034
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.369 on 383 degrees of freedom
Multiple R-squared: 0.1927, Adjusted R-squared: 0.1695
F-statistic: 8.313 on 11 and 383 DF, p-value: 3.968e-13
因Dalc&Walc具高相關性但不影響成績,故可排除。
Call:
lm(formula = gmeans ~ age + Medu + Fedu + failures + traveltime +
goout, data = student_data)
Residuals:
Min 1Q Median 3Q Max
-9.2480 -2.1065 -0.0019 2.2821 8.3447
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.332195 2.442666 5.049 6.86e-07 ***
age -0.039885 0.139294 -0.286 0.7748
Medu 0.483759 0.201271 2.404 0.0167 *
Fedu -0.003631 0.202249 -0.018 0.9857
failures -1.577369 0.244427 -6.453 3.27e-10 ***
traveltime -0.371541 0.248266 -1.497 0.1353
goout -0.399720 0.155660 -2.568 0.0106 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.372 on 388 degrees of freedom
Multiple R-squared: 0.1806, Adjusted R-squared: 0.1679
F-statistic: 14.25 on 6 and 388 DF, p-value: 1.077e-14
pvalue更小,模型解釋力仍低。
小結
1.
迴歸模型預測解釋力低,推測較不適用於此資料集,因數據多為分等級(Level)的資料,且類別型欄位多,無法由數值型欄位預測成績表現。
2. 父&母親教育程度、與朋友出遊頻率對學生成績最有影響。
3.
接著對類別型欄位作檢定分析,查看是否有影響成績表現較為突出的欄位。
F test to compare two variances
data: stu_female$gmeans and stu_male$gmeans
F = 0.9252, num df = 207, denom df = 186, p-value = 0.5849
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.6979068 1.2239012
sample estimates:
ratio of variances
0.9251996
->var1 = var2 落在接受域
Two Sample t-test
data: student_data[, "gmeans"] by student_data[, "sex"]
t = -2.015, df = 393, p-value = 0.04459
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
-1.4773516 -0.0181749
sample estimates:
mean in group F mean in group M
10.32532 11.07308
->P value<0.05,故性別在數學成績表現上有差異。
F test to compare two variances
data: stu_radd$gmeans and stu_uadd$gmeans
F = 1.0232, num df = 87, denom df = 306, p-value = 0.8685
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7412387 1.4587802
sample estimates:
ratio of variances
1.023217
->var1 = var2 落在接受域
Two Sample t-test
data: student_data[, "gmeans"] by student_data[, "address"]
t = -2.1394, df = 393, p-value = 0.03302
alternative hypothesis: true difference in means between group R and group U is not equal to 0
95 percent confidence interval:
-1.82688587 -0.07717099
sample estimates:
mean in group R mean in group U
9.939394 10.891422
->P value<0.05,故居住地在數學成績表現上有差異。 Df Sum Sq Mean Sq F value Pr(>F)
Mjob 4 235 58.77 4.451 0.00158 **
Residuals 390 5149 13.20
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
->母親工作類別對於學生成績表現有差異。 Tukey multiple comparisons of means
99% family-wise confidence level
Fit: aov(formula = gmeans ~ Mjob, data = student_data)
$Mjob
diff lwr upr p adj
health-at_home 2.4725823 -0.09163225 5.03679675 0.0145850
other-at_home 0.2963898 -1.55014620 2.14292578 0.9846718
services-at_home 1.4444079 -0.50001728 3.38883303 0.1083584
teacher-at_home 1.5074031 -0.69466768 3.70947384 0.1660319
other-health -2.1761925 -4.45154320 0.09915828 0.0158144
services-health -1.0281744 -3.38366053 1.32731177 0.6082512
teacher-health -0.9651792 -3.53746249 1.60710414 0.7339905
services-other 1.1480181 -0.39561858 2.69165475 0.1076394
teacher-other 1.2110133 -0.64671129 3.06873787 0.2067748
teacher-services 0.0629952 -1.89205841 2.01804881 0.9999718
->從事健康護理相關和從事居家工作差異最大。
[[1]]
[[1]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 5.0 6 3 5 6
[2,] 8.0 10 8 8 9
[3,] 10.0 13 10 11 11
[4,] 12.5 14 13 14 14
[5,] 18.0 19 19 19 18
[[1]]$n
[1] 59 34 141 103 58
[[1]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 9.074357 11.91613 9.3347 10.06591 9.962679
[2,] 10.925643 14.08387 10.6653 11.93409 12.037321
[[1]]$out
numeric(0)
[[1]]$group
numeric(0)
[[1]]$names
[1] "at_home" "health" "other" "services" "teacher"
[[2]]
[[2]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 5 7 5 4 5
[2,] 8 10 8 9 9
[3,] 10 13 10 12 11
[4,] 12 15 13 14 14
[5,] 18 19 18 18 19
[[2]]$n
[1] 59 34 141 103 58
[[2]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 9.177206 11.64516 9.3347 11.22159 9.962679
[2,] 10.822794 14.35484 10.6653 12.77841 12.037321
[[2]]$out
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0
[[2]]$group
[1] 1 1 1 3 3 3 3 3 4 4 4 4 5
[[2]]$names
[1] "at_home" "health" "other" "services" "teacher"
[[3]]
[[3]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 5.0 8 4 5.0 6
[2,] 8.0 10 8 9.0 9
[3,] 10.0 13 11 11.0 11
[4,] 12.5 15 13 14.5 14
[5,] 19.0 20 19 19.0 19
[[3]]$n
[1] 59 34 141 103 58
[[3]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 9.074357 11.64516 10.3347 10.14375 9.962679
[2,] 10.925643 14.35484 11.6653 11.85625 12.037321
[[3]]$out
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[[3]]$group
[1] 1 1 1 1 1 1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 5 5 5
[[3]]$names
[1] "at_home" "health" "other" "services" "teacher"
[[4]]
[[4]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 2.666667 4.333333 1.333333 2.333333 2.00000
[2,] 7.500000 9.666667 8.000000 9.000000 9.00000
[3,] 9.666667 12.833333 10.000000 11.333333 10.83333
[4,] 12.500000 14.666667 12.666667 13.666667 14.33333
[5,] 18.333333 19.333333 18.666667 18.333333 18.66667
[[4]]$n
[1] 59 34 141 103 58
[[4]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 8.638174 11.47849 9.379053 10.60682 9.726858
[2,] 10.695159 14.18817 10.620947 12.05985 11.939809
[[4]]$out
[1] 1.666667
[[4]]$group
[1] 4
[[4]]$names
[1] "at_home" "health" "other" "services" "teacher"
->母親職業為健康護理相關對於學生成績表現最好。
->母親從事居家工作對於學生成績表現最差。
TrainFlag
FALSE TRUE
at_home 15 44
health 8 26
other 35 106
services 26 77
teacher 14 44
$Fold01
[1] 0.25
$Fold02
[1] 0.45
$Fold03
[1] 0.3
$Fold04
[1] 0.35
$Fold05
[1] 0.4
$Fold06
[1] 0.15
$Fold07
[1] 0.35
$Fold08
[1] 0.1904762
$Fold09
[1] 0.3
$Fold10
[1] 0.35
$Fold11
[1] 0.4210526
$Fold12
[1] 0.2105263
$Fold13
[1] 0.15
$Fold14
[1] 0.2
$Fold15
[1] 0.5555556
$Fold16
[1] 0.3157895
$Fold17
[1] 0.45
$Fold18
[1] 0.15
$Fold19
[1] 0.25
$Fold20
[1] 0.4210526
[1] 0.2500000 0.4500000 0.3000000 0.3500000 0.4000000 0.1500000 0.3500000
[8] 0.1904762 0.3000000 0.3500000 0.4210526 0.2105263 0.1500000 0.2000000
[15] 0.5555556 0.3157895 0.4500000 0.1500000 0.2500000 0.4210526
[1] 0.3107226
->Training_data 75%、Test_data 25%
->跑20次,最佳0.4、最差0.1,平均0.27,可得知準確率偏低。
Df Sum Sq Mean Sq F value Pr(>F)
Fjob 4 105 26.27 1.941 0.103
Residuals 390 5279 13.54
->父親工作類別對於學生成績表現有差異。->從事教師職業和從事其他職業類型差異最大。
[[1]]
[[1]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 5.0 6 4 3 5
[2,] 9.0 10 8 8 10
[3,] 11.5 11 10 11 14
[4,] 14.5 14 13 13 16
[5,] 18.0 17 19 19 19
[[1]]$n
[1] 20 18 217 111 29
[[1]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 9.556857 9.510362 9.463713 10.25017 12.23961
[2,] 13.443143 12.489638 10.536287 11.74983 15.76039
[[1]]$out
numeric(0)
[[1]]$group
numeric(0)
[[1]]$names
[1] "at_home" "health" "other" "services" "teacher"
[[2]]
[[2]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 5 6.0 4 5 0
[2,] 9 9.0 8 9 9
[3,] 11 11.5 10 11 13
[4,] 14 14.0 13 13 16
[5,] 18 17.0 19 19 19
[[2]]$n
[1] 20 18 217 111 29
[[2]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 9.233506 9.637952 9.463713 10.40013 10.94621
[2,] 12.766494 13.362048 10.536287 11.59987 15.05379
[[2]]$out
[1] 0 0 0 0 0 0 0 0 0 0 0
[[2]]$group
[1] 1 1 3 3 3 3 3 3 3 3 4
[[2]]$names
[1] "at_home" "health" "other" "services" "teacher"
[[3]]
[[3]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 6.0 7 4 5 6
[2,] 8.5 9 8 9 10
[3,] 11.0 11 11 11 14
[4,] 13.5 14 13 13 16
[5,] 19.0 18 19 18 19
[[3]]$n
[1] 20 18 217 111 29
[[3]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 9.233506 9.137952 10.46371 10.40013 12.23961
[2,] 12.766494 12.862048 11.53629 11.59987 15.76039
[[3]]$out
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20
[26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[[3]]$group
[1] 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5
[39] 5
[[3]]$names
[1] "at_home" "health" "other" "services" "teacher"
[[4]]
[[4]]$stats
[,1] [,2] [,3] [,4] [,5]
[1,] 1.666667 6.666667 1.333333 2.333333 3.000000
[2,] 8.333333 9.000000 8.000000 8.666667 9.666667
[3,] 11.333333 11.166667 10.333333 10.333333 12.666667
[4,] 14.000000 14.000000 13.000000 13.000000 15.666667
[5,] 18.333333 17.333333 18.666667 19.333333 18.666667
[[4]]$n
[1] 20 18 217 111 29
[[4]]$conf
[,1] [,2] [,3] [,4] [,5]
[1,] 9.331307 9.304619 9.797046 9.683476 10.90627
[2,] 13.335360 13.028715 10.869620 10.983190 14.42706
[[4]]$out
numeric(0)
[[4]]$group
numeric(0)
[[4]]$names
[1] "at_home" "health" "other" "services" "teacher"
->父親職業為教師相關對於學生成績表現最好。
->父親從事其他類型的工作對於學生成績表現最差。
TrainFlag
FALSE TRUE
at_home 5 15
health 4 14
other 54 163
services 28 83
teacher 7 22
$Fold01
[1] 0.3157895
$Fold02
[1] 0.55
$Fold03
[1] 0.3684211
$Fold04
[1] 0.5263158
$Fold05
[1] 0.4736842
$Fold06
[1] 0.45
$Fold07
[1] 0.3
$Fold08
[1] 0.4736842
$Fold09
[1] 0.5
$Fold10
[1] 0.5263158
$Fold11
[1] 0.55
$Fold12
[1] 0.6
$Fold13
[1] 0.5238095
$Fold14
[1] 0.4
$Fold15
[1] 0.2105263
$Fold16
[1] 0.5714286
$Fold17
[1] 0.45
$Fold18
[1] 0.4285714
$Fold19
[1] 0.2105263
$Fold20
[1] 0.4
[1] 0.3157895 0.5500000 0.3684211 0.5263158 0.4736842 0.4500000 0.3000000
[8] 0.4736842 0.5000000 0.5263158 0.5500000 0.6000000 0.5238095 0.4000000
[15] 0.2105263 0.5714286 0.4500000 0.4285714 0.2105263 0.4000000
[1] 0.4414536
->Training_data 75%、Test_data 25%
->跑20次,最佳0.65、最差0.17,平均0.43,準確度較前者高。
小結
1.
從二元類型欄位檢定可以看出性別&居住地對學生數學成績表現皆有影響:
->男生高於女生、居住於城市高於鄉村
2.
從類別型欄位檢定可以看出母親職業別為「健康護理相關」、父親為「教師」,對於學生成績表現最佳。
3. 父親的職業別比起母親更容易影響學生成績。
4. 接著畫出關聯性規則分析影響成績表現之組合。
[1] "level2" "level2" "level2" "level3" "level2" "level3" "level3" "level2"
[9] "level4" "level3" "level2" "level3" "level3" "level3" "level4" "level3"
[17] "level3" "level2" "level2" "level2" "level3" "level3" "level4" "level3"
[25] "level2" "level2" "level3" "level4" "level3" "level3" "level3" "level4"
[33] "level4" "level2" "level3" "level2" "level4" "level4" "level3" "level3"
[41] "level2" "level3" "level4" "level2" "level2" "level2" "level3" "level4"
[49] "level3" "level2" "level3" "level3" "level3" "level2" "level3" "level2"
[57] "level3" "level3" "level2" "level4" "level3" "level2" "level2" "level2"
[65] "level2" "level4" "level3" "level2" "level2" "level4" "level3" "level2"
[73] "level2" "level3" "level3" "level2" "level3" "level3" "level2" "level1"
[81] "level3" "level3" "level2" "level3" "level2" "level2" "level2" "level3"
[89] "level3" "level2" "level2" "level4" "level2" "level3" "level3" "level2"
[97] "level3" "level2" "level3" "level2" "level2" "level4" "level3" "level2"
[105] "level4" "level3" "level2" "level4" "level3" "level3" "level4" "level2"
[113] "level3" "level4" "level2" "level4" "level3" "level3" "level2" "level3"
[121] "level4" "level3" "level3" "level3" "level2" "level3" "level2" "level2"
[129] "level1" "level4" "level1" "level1" "level3" "level3" "level1" "level1"
[137] "level1" "level1" "level3" "level4" "level2" "level2" "level3" "level3"
[145] "level1" "level2" "level1" "level3" "level1" "level2" "level1" "level3"
[153] "level2" "level1" "level3" "level2" "level3" "level2" "level4" "level3"
[161] "level1" "level2" "level1" "level2" "level2" "level3" "level2" "level3"
[169] "level1" "level3" "level1" "level3" "level3" "level1" "level2" "level2"
[177] "level3" "level2" "level2" "level3" "level2" "level3" "level4" "level2"
[185] "level3" "level3" "level3" "level3" "level2" "level2" "level3" "level2"
[193] "level2" "level2" "level3" "level3" "level4" "level2" "level4" "level2"
[201] "level4" "level2" "level2" "level2" "level3" "level2" "level2" "level3"
[209] "level2" "level2" "level2" "level3" "level3" "level2" "level2" "level3"
[217] "level2" "level2" "level2" "level2" "level2" "level1" "level4" "level3"
[225] "level3" "level2" "level4" "level3" "level2" "level3" "level3" "level3"
[233] "level2" "level3" "level2" "level2" "level3" "level3" "level3" "level1"
[241] "level3" "level3" "level1" "level3" "level1" "level4" "level3" "level2"
[249] "level1" "level3" "level2" "level2" "level2" "level2" "level3" "level2"
[257] "level3" "level3" "level3" "level2" "level4" "level2" "level3" "level2"
[265] "level2" "level4" "level2" "level3" "level2" "level1" "level2" "level3"
[273] "level3" "level3" "level2" "level3" "level2" "level2" "level2" "level3"
[281] "level2" "level2" "level3" "level2" "level2" "level3" "level4" "level3"
[289] "level3" "level3" "level3" "level3" "level3" "level4" "level3" "level3"
[297] "level2" "level2" "level3" "level4" "level3" "level3" "level3" "level4"
[305] "level3" "level3" "level4" "level2" "level3" "level3" "level2" "level3"
[313] "level3" "level3" "level3" "level3" "level2" "level2" "level3" "level3"
[321] "level3" "level2" "level3" "level3" "level4" "level3" "level3" "level3"
[329] "level2" "level3" "level2" "level3" "level1" "level2" "level2" "level4"
[337] "level3" "level1" "level4" "level2" "level3" "level2" "level4" "level2"
[345] "level3" "level3" "level4" "level2" "level3" "level3" "level2" "level3"
[353] "level2" "level2" "level3" "level2" "level3" "level3" "level2" "level4"
[361] "level3" "level3" "level3" "level4" "level3" "level2" "level3" "level1"
[369] "level3" "level3" "level2" "level3" "level3" "level2" "level4" "level2"
[377] "level3" "level2" "level3" "level2" "level3" "level2" "level3" "level1"
[385] "level2" "level2" "level2" "level1" "level2" "level1" "level2" "level4"
[393] "level2" "level3" "level2"
->成績平均值分佈範圍為0~20,主要集中在6~14區間內,因為要區分成績表現,故level取: [1] "18~19y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[9] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[17] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[25] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[33] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[41] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[49] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[57] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[65] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[73] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[81] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[89] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[97] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[105] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[113] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[121] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y"
[129] "18~19y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[137] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[145] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y"
[153] "15~17y" "18~19y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y"
[161] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[169] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[177] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[185] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[193] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[201] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[209] "15~17y" "15~17y" "18~19y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y"
[217] "15~17y" "18~19y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y"
[225] "15~17y" "18~19y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y" "15~17y"
[233] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y"
[241] "15~17y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y" "20~22y"
[249] "18~19y" "15~17y" "18~19y" "15~17y" "18~19y" "15~17y" "15~17y" "15~17y"
[257] "15~17y" "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "15~17y"
[265] "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y"
[273] "18~19y" "15~17y" "15~17y" "15~17y" "18~19y" "18~19y" "18~19y" "18~19y"
[281] "15~17y" "15~17y" "18~19y" "18~19y" "15~17y" "15~17y" "18~19y" "15~17y"
[289] "18~19y" "18~19y" "18~19y" "15~17y" "18~19y" "15~17y" "18~19y" "15~17y"
[297] "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "15~17y" "15~17y"
[305] "18~19y" "18~19y" "20~22y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y"
[313] "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "18~19y"
[321] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y"
[329] "15~17y" "15~17y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "15~17y"
[337] "18~19y" "15~17y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "15~17y"
[345] "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "15~17y"
[353] "18~19y" "18~19y" "15~17y" "18~19y" "15~17y" "15~17y" "18~19y" "18~19y"
[361] "18~19y" "18~19y" "18~19y" "15~17y" "15~17y" "18~19y" "18~19y" "15~17y"
[369] "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "15~17y" "18~19y" "18~19y"
[377] "20~22y" "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "15~17y" "18~19y"
[385] "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "20~22y" "15~17y"
[393] "20~22y" "18~19y" "18~19y"
->年齡分佈範圍在15~22,且多集中在1\6~18區間,因為想區分中學和大學生(大~大二、大三~大四),故level取:[1] 395
->缺席堂數雖區間為0~95,但資料集內最大值為75堂,且從中位數和Q1~Q3來看,資料多集中在10以下,故level取:
缺席0~3堂
缺席4~6堂
缺席7~10堂
缺席11堂以上
'data.frame': 395 obs. of 31 variables:
$ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
$ address : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
$ famsize : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
$ Pstatus : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
$ Medu : Factor w/ 5 levels "0","1","2","3",..: 5 2 2 5 4 5 3 5 4 4 ...
$ Fedu : Factor w/ 5 levels "0","1","2","3",..: 5 2 2 3 4 4 3 5 3 5 ...
$ Mjob : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
$ Fjob : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
$ reason : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
$ guardian : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
$ traveltime: Factor w/ 4 levels "1","2","3","4": 2 1 1 1 1 1 1 2 1 1 ...
$ studytime : Factor w/ 4 levels "1","2","3","4": 2 2 2 3 2 2 2 2 2 2 ...
$ failures : Factor w/ 4 levels "0","1","2","3": 1 1 4 1 1 1 1 1 1 1 ...
$ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
$ famsup : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
$ paid : Factor w/ 2 levels "no","yes": 1 1 2 2 2 2 1 1 2 2 ...
$ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
$ nursery : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
$ higher : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
$ internet : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
$ romantic : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
$ famrel : Factor w/ 5 levels "1","2","3","4",..: 4 5 4 3 4 5 4 4 4 5 ...
$ freetime : Factor w/ 5 levels "1","2","3","4",..: 3 3 3 2 3 4 4 1 2 5 ...
$ goout : Factor w/ 5 levels "1","2","3","4",..: 4 3 2 2 2 2 4 4 2 1 ...
$ Dalc : Factor w/ 5 levels "1","2","3","4",..: 1 1 2 1 1 1 1 1 1 1 ...
$ Walc : Factor w/ 5 levels "1","2","3","4",..: 1 1 3 1 2 2 1 1 1 1 ...
$ health : Factor w/ 5 levels "1","2","3","4",..: 3 3 3 5 5 5 3 1 1 5 ...
$ glevel : Factor w/ 4 levels "level1","level2",..: 2 2 2 3 2 3 3 2 4 3 ...
$ agelevel : Factor w/ 3 levels "15~17y","18~19y",..: 2 1 1 1 1 1 1 1 1 1 ...
$ abslevel : Factor w/ 4 levels "0~3","11up","4~6",..: 3 3 4 1 3 4 1 3 1 1 ...
將所有變數轉成Factor型態,Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.5 0.1 1 none FALSE TRUE 5 0.3 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 118
set item appearances ...[4 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [36 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 done [0.02s].
writing ... [8 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
lhs rhs support confidence coverage lift count
[1] {failures=0,
schoolsup=no,
higher=yes,
internet=yes} => {glevel=level3} 0.3012658 0.5242291 0.5746835 1.225269 119
[2] {failures=0,
schoolsup=no,
internet=yes} => {glevel=level3} 0.3063291 0.5170940 0.5924051 1.208593 121
[3] {school=GP,
failures=0,
schoolsup=no,
higher=yes} => {glevel=level3} 0.3012658 0.5085470 0.5924051 1.188616 119
[4] {failures=0,
schoolsup=no,
higher=yes} => {glevel=level3} 0.3392405 0.5056604 0.6708861 1.181869 134
[5] {Pstatus=T,
failures=0,
schoolsup=no,
higher=yes} => {glevel=level3} 0.3063291 0.5041667 0.6075949 1.178378 121
[6] {failures=0,
schoolsup=no} => {glevel=level3} 0.3443038 0.5000000 0.6886076 1.168639 136
[7] {school=GP,
failures=0,
schoolsup=no} => {glevel=level3} 0.3037975 0.5000000 0.6075949 1.168639 120
[8] {Pstatus=T,
failures=0,
schoolsup=no} => {glevel=level3} 0.3113924 0.5000000 0.6227848 1.168639 123
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.5 0.1 1 none FALSE TRUE 5 0.1 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 39
set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [82 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.52s].
writing ... [32 rule(s)] done [0.01s].
creating S4 object ... done [0.01s].
lhs rhs support confidence coverage lift count
[1] {school=GP,
sex=F,
studytime=2,
agelevel=15~17y} => {glevel=level2} 0.1012658 0.5333333 0.1898734 1.413870 40
[2] {school=GP,
sex=F,
studytime=2,
higher=yes,
agelevel=15~17y} => {glevel=level2} 0.1012658 0.5333333 0.1898734 1.413870 40
[3] {sex=F,
traveltime=1,
studytime=2} => {glevel=level2} 0.1012658 0.5263158 0.1924051 1.395267 40
[4] {sex=F,
Fjob=other,
famsup=yes} => {glevel=level2} 0.1012658 0.5263158 0.1924051 1.395267 40
[5] {school=GP,
sex=F,
studytime=2,
higher=yes} => {glevel=level2} 0.1265823 0.5263158 0.2405063 1.395267 50
[6] {school=GP,
sex=F,
studytime=2,
nursery=yes,
higher=yes} => {glevel=level2} 0.1012658 0.5263158 0.1924051 1.395267 40
[7] {school=GP,
sex=F,
studytime=2} => {glevel=level2} 0.1316456 0.5252525 0.2506329 1.392448 52
[8] {school=GP,
sex=F,
address=U,
studytime=2,
higher=yes} => {glevel=level2} 0.1139241 0.5172414 0.2202532 1.371210 45
[9] {school=GP,
sex=F,
address=U,
studytime=2} => {glevel=level2} 0.1164557 0.5168539 0.2253165 1.370183 46
[10] {sex=F,
Fjob=other,
agelevel=15~17y} => {glevel=level2} 0.1012658 0.5128205 0.1974684 1.359491 40
[11] {school=GP,
sex=F,
studytime=2,
nursery=yes} => {glevel=level2} 0.1012658 0.5128205 0.1974684 1.359491 40
[12] {sex=F,
Fjob=other,
higher=yes,
agelevel=15~17y} => {glevel=level2} 0.1012658 0.5128205 0.1974684 1.359491 40
[13] {sex=F,
studytime=2,
agelevel=15~17y} => {glevel=level2} 0.1037975 0.5125000 0.2025316 1.358641 41
[14] {sex=F,
studytime=2,
higher=yes,
agelevel=15~17y} => {glevel=level2} 0.1037975 0.5125000 0.2025316 1.358641 41
[15] {Fedu=1} => {glevel=level2} 0.1063291 0.5121951 0.2075949 1.357833 42
[16] {school=GP,
sex=F,
paid=yes,
nursery=yes} => {glevel=level2} 0.1088608 0.5119048 0.2126582 1.357063 43
[17] {sex=F,
Pstatus=T,
paid=yes,
nursery=yes} => {glevel=level2} 0.1088608 0.5119048 0.2126582 1.357063 43
[18] {school=GP,
sex=F,
paid=yes,
nursery=yes,
higher=yes} => {glevel=level2} 0.1088608 0.5119048 0.2126582 1.357063 43
[19] {sex=F,
Pstatus=T,
paid=yes,
nursery=yes,
higher=yes} => {glevel=level2} 0.1088608 0.5119048 0.2126582 1.357063 43
[20] {school=GP,
sex=F,
Pstatus=T,
studytime=2,
higher=yes} => {glevel=level2} 0.1037975 0.5061728 0.2050633 1.341868 41
[21] {sex=F,
studytime=2,
nursery=yes,
higher=yes} => {glevel=level2} 0.1113924 0.5057471 0.2202532 1.340739 44
[22] {sex=F,
paid=yes,
nursery=yes} => {glevel=level2} 0.1189873 0.5053763 0.2354430 1.339756 47
[23] {sex=F,
paid=yes,
nursery=yes,
higher=yes} => {glevel=level2} 0.1189873 0.5053763 0.2354430 1.339756 47
[24] {sex=F,
address=U,
studytime=2,
higher=yes} => {glevel=level2} 0.1215190 0.5052632 0.2405063 1.339456 48
[25] {sex=F,
address=U,
studytime=2} => {glevel=level2} 0.1240506 0.5051546 0.2455696 1.339168 49
[26] {sex=F,
studytime=2,
higher=yes} => {glevel=level2} 0.1392405 0.5045872 0.2759494 1.337664 55
[27] {sex=F,
studytime=2} => {glevel=level2} 0.1443038 0.5044248 0.2860759 1.337233 57
[28] {higher=yes,
goout=4} => {glevel=level2} 0.1063291 0.5000000 0.2126582 1.325503 42
[29] {studytime=2,
freetime=3} => {glevel=level2} 0.1037975 0.5000000 0.2075949 1.325503 41
[30] {Fjob=other,
higher=yes,
freetime=3} => {glevel=level2} 0.1037975 0.5000000 0.2075949 1.325503 41
[31] {school=GP,
sex=F,
Pstatus=T,
studytime=2} => {glevel=level2} 0.1063291 0.5000000 0.2126582 1.325503 42
[32] {sex=F,
famsize=GT3,
Fjob=other,
higher=yes} => {glevel=level2} 0.1012658 0.5000000 0.2025316 1.325503 40
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.5 0.1 1 none FALSE TRUE 5 0.1 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 39
set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [82 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.52s].
writing ... [0 rule(s)] done [0.01s].
creating S4 object ... done [0.01s].
set of 0 rules
規則數為0。Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.5 0.1 1 none FALSE TRUE 5 0.1 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 39
set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [82 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.50s].
writing ... [0 rule(s)] done [0.01s].
creating S4 object ... done [0.01s].
set of 0 rules
規則數為0。
小結
1.
因成績在level1和level4的資料較少,且support值取到0.1仍找不出規則,故以成績分佈在level2~3為主。
2.
從前述規則可以看出「過去課程失敗數量為0、沒有額外的教育支持、想接受高等教育和家中有網路」會是成績表現較高之因素。
3.
而「性別為女生、中學生(年齡為15~17y)、一週自主讀書時間較少(2~5小時)和就讀GP學校」會使得學生成績表現較低。
結論
1.
性別影響成績:女性在數學成績表現上較低。
2.
家庭背景影響成績:家庭背景(父母親教育程度&職業)是影響學生數學成績表現的重要因素。
3.
家長職業:家中有教師職業的父親可為學生成績帶來提升,推測可能為管教嚴且會教導小孩學習。
4.
學生本身意願:學生本身想持續升學與家中有網路會提高學生成績,推測是因為學生自學興趣高且有網路資源可學習。
5.
課程難度&年紀:自主學習時間少且年紀較低會使成績表現差,推測可能原因為教學內容對於低年級學生來說難度較高。